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Abstract 

This paper studies the use of the Tsalhs Entropy versus the classic Boltzmann- 
Gibbs-Shannon entropy for classifying image patterns. Given a database of 
40 pattern classes, the goal is to determine the class of a given image sample. 
Our experiments show that the Tsallis entropy encoded in a feature vector for 
different q indices has great advantage over the Boltzmann-Gibbs-Shannon 
entropy for pattern classification, boosting recognition rates by a factor of 3. 
We discuss the reasons behind this success, shedding light on the usefulness 
of the Tsallis entropy. 

Keywords: image pattern classification, texture, Tsallis entropy, 
non-additive entropy 



1. Introduction 

Image pattern classification is an important problem for a number of 
fields, e.g., for recognizing embryo development stages P, determining pro- 
tein profile from flourescence microscopy images [2], and identifying cellular 
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structures [3] • This can be a challenging problem, specially for a large num- 
ber of classes, with many different solutions proposed in the literature [1]. In 
this paper, however, our primary goal is to study the usefulness of the Tsallis 
entropy ^ by comparing it to the classic Boltzmann-Gibbs-Shannon (BGS) 
entropy when apphed to classifying image patterns. 

In statistical mechanics, the concept of entropy is related to the distri- 
bution of states, which can be characterized by the system energy levels. 
From an information-theoretic point of view, entropy is related to the lack 
of information of a system. It also represents how close a given probability 
distribution is to the uniform distribution, i.e., it is a measure of randomness, 
peaking at the uniform distribution itself. For image pattern classification, 
this interpretation can be useful since, for example, a symmetric, periodic or 
smooth image has less "possible states" than more uniformly random images. 
A direct link between probability distributions and such concept of entropy 
was proposed by Boltzmann and is the foundation of what is now known as 
the Boltzmann-Gibbs-Shannon (BGS) entropy. A well-know generalization of 
this concept, the Tsallis entropy [5j, extends its application to so-called non- 
extensive systems using a new parameter q. It recovers the classic entropy for 
g — )• 1, and is better suited for long-range interactions between states {e.g., 
in large pixel neighborhoods) and long-term memories. The Tsallis general- 
ization of entropy has a vast spectrum of application, ranging from physics 
and chemistry to computer science. For instance, using the non-extensive 
entropy instead of the BGS entropy can produce gains in the results and 
efficiency of optimization algorithms ^6J, image segmentation [3 [HI [9] or edge 
detection algorithms [TO] . 

In this paper we study the power of the Tsallis entropy in comparison 
to classic BGS entropy for the construction of feature vectors for classifying 
image patterns or textures. Given a database of patterns, typically compris- 
ing 40 classes, the goal is to determine the class of a given image sample. 
Our experiments show that the Tsallis entropy encoded in a feature vector 
for different q indices produces great advantage over the Boltzmann-Gibbs- 
Shannon entropy for this problem, boosting recognition rates by a factor of 
3. 

This paper is organized as follows. A review of definitions and notation 
of fundamental concepts is given in Section |2} Our is described Details about 
the problem of image pattern classification and our approach based on the 
Tsallis entropy are described in Section |3| The basic experimental setup is 
provided in Section |4j In Section [5] experimental results towards analyzing 
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the power of the Tsalhs entropy for image pattern classification are described. 
Section |6] revises and concludes the paper. 



2. Formulation and Notation 

Assume pi is the probability distribution or the histogram of graylevels 
gi = 1, ■ ■ ■ ,W in the grayscale image /, i.e., pi equals the number of pixels 
having intensity i divided by the total number of pixels. We assume gi = 
stands for black, and (7256 = 255 for white. The number W of different 
graylevels is typically 256 for 8-bit images. The BGS entropy is defined asj^ 

w 

Sbg = -^Pi^ogpi, ^pi = l. (1) 

i=l 

In the special case of a uniform distribution, pi = 1/W, so that Sbg = log W^- 
Similarly, the Tsallis entropy is defined as 

= (2) 

which recovers BGS entropy in the limit for g — )■ 1. The relation to BGS 
entropy is made clearer by rewriting this definition in the form: 

Sqip) = -^ptKPi^ (3) 

i 

where 

K{x)=^-^ (4) 

is called the g- logarithm, with Inq(x) — )■ Inx for g — )■ 1. For any value 
of g > 0, Sq satisfies similar properties to the BGS entropy; for instance, 
Sg > 0, and Sg attains its maximum at the uniform distribution. 

The BGS entropy is additive in the sense that the entropy of the whole 
system (the entropy of the sum) coincides with the sum of the entropies of 
the parts. This is not the case for the Tsallis entropy when g 7^ 1, however. 
Formally, 

Sbg{A + B) = Sbg{A) + Sbg{B), (5) 



We have dropped a constant of direct proportionality for the purpose of this paper. 
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while 



S,iA + B) = SgiA) + S,iB) + (1 - q)SgiA)S,iB). (6) 

3. Multi-q analysis 

The primary goal of image pattern classification is to assign a class label to 
a given image sample or window, the label being chosen among a predefined 
set of classes in a database. Figure [T] shows a schematic of the classification 
process. 




Figure 1: Our approach to image pattern classification works by extracting the histogram 
of an input image sample, computing a vector of Tsallis Entropies, and using that in a 
classifier to produce a class label among 40 classes stored in a database. The image samples 
in this figure are real samples illustrating the Brodatz database used in this paper. 

In supervised classification, the classifier is trained from a set of samples 
that are known to belong to the classes (a priori knowledge). The classifier is 
then validated by another set of samples. This methodology can be used for 
pattern recognition tasks as well as for mathematical modeling. Traditionally, 
in image analysis, a feature vector is extracted from the image and used 
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to train and validate the classifier. It is expected that the feature vector 
concentrates the most important information about the image. 

In this work we investigate the Tsallis entropy as tool to analyze image in- 
formation and compare it to the traditional BGS entropy. Beyond statistical 
mechanics, the BGS entropy is also traditionally used in information theory, 
and is present as a metric in many image analysis methods, for instance Ga- 
bor texture analysis, Fourier analysis, wavelet, shape analyses among many 
others. A classical and simple problem in image analysis is considering the 
distribution of pixel intensities in an image as a measure of texture, by an- 
alyzing its histogram. Such an approach has been used since the 70 's and, 
despite its simplicity, provides good results and is still subject of active re- 
search [H]. Therefore, we have decided to use histogram texture analysis to 
investigate the potentiality of Tsallis entropy applied to information theory 
in the context of images, and comparing its results to the those obtained 
with BGS entropy alone. 

Histogram texture analysis begins by computing the image histogram pi 
of intensities, where pi is the number of pixels in the image for each intensity 
i. Assuming 8-bit grayscale images, it abstracts the image information into 
a feature vector of 256 dimensions. The histogram encodes a mixture of 
multiple intensity distributions representing luminosity patterns of image 
subsets, therefore being a clear candidate for image pattern representation 
for a number of classification applications. Although the histogram is largely 
used in image analysis, it is limited, due to its simplicity. For instance, the 
spatial information is not preserved by the histogram. Different images that 
have the same distribution of pixels have the same histogram; for instance, 
consider two images: a checkerboard pattern and an image split in the middle 
in black and white. While the visual information presented is quite different, 
they have the same histogram. 

Despite its limitations, the image histogram has been used for different 
purposes, achieving good results, e.g., in image segmentation, image thresh- 
olding and pattern recognition. In this work, we are taking into account the 
third alternative applied to image classification. To classify an image or an 
image sample based on the histogram, statistical metrics are traditionally 
employed, such as mean, mode, kurtosis and BGS entropy. Therefore, the 
simplicity and popularity of the image histogram can help focus the results 
of the classification on the entropy analysis itself. 

The concept of Tsallis entropy (and, in particular, BGS entropy) defined 
in Section [2] provides ways to further abstract the information of the intensity 
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histogram. For g = 1 we have the classic BGS entropy, strongly abstracting 
the 256-dimensional histogram into the extreme case of a single number 5*1 . 
This paper explores multiple q parameters towards forming better feature 
vectors for classification. We construct feature vectors of the form Sq = 
{Sg-^, . . . , Sq^), Figure [T| whose dimension n (typically 4-20) provides a middle 
ground between the total abstraction of ID BGS entropy and the full 256- 
dimensional histogram. The experiments show that very few dimensions of 
Tsallis-entropy values in Sq are already enough to outperform BGS entropy 
by a large factor. 

4. Experimental Setup 

The database used to evaluate our approach was created from Brodatz's 
art book [12]. This book is a black and white photography study for art 
and design and it was carried out on different patterns from wood, grass, 
fabric, among others. The Brodatz database became popular in the imaging 
sciences and is widely used as a benchmark for the visual attribute of texture. 
The database used in our work consists of 40 classes of texture, where each 
class is represented by a prototypical photograph of the texture containing 
no other patterns. Such 512 x 512 images are scans of glossy prints that were 
acquired from the author. A given image sample to be classified is much 
smaller than the class prototype image, 200 x 200 in this paper. The 512 x 512 
prototype image can generate numerous 200 x 200 image samples representing 
the same class using a sliding window scheme. To construct our final database 
we perform this sliding window process to extract 10 representative image 
samples for each class. Therefore, any incoming sample to be classified is of 
the same size as the training windows. A few samples from this database are 
ilustrated in Figure [TJ 

Several approaches for the task of classification have been proposed in 
the hterature. Since our focus is on image representation, we used the simple 
and well-known Naive Bayes classifier rather than a more sophisticated clas- 
sifier. Although more sophisticated classifiers have been shown to produce 
superior results (e.g. multilayer perceptron and support vector machine), we 
are interested in showing the discrimination power provided by the features 
themselves rather than showing the classification power of classifiers. 

In order to objectively evaluate the performance of the Tsallis entropy 
versus BGS entropy for classification, we use the stratified 10-fold cross- 
validation scheme [13]. In this scheme, the samples are randomly divided 
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into 10 folds, considering that each fold contains the same proportions of 
the classes (i.e. for the Brodatz dataset, each fold contains 40 samples, one 
sample of each class). At each run of this scheme, the classifier is trained 
using all but one fold and then evaluated on how it classifies the samples from 
the separated fold. This process is repeated such that each fold is used once 
as validation. The performance is averaged, generating a single number for 
classification rate which represents the overall proportion of success over all 
runs. A standard deviation is also computed and displayed when significant. 
In the next section, moreover, a confusion matrix is eventually generated to 
analyze the performance of a specific classifier strategy. The confusion matrix 
is very well known in statistical classification and artificial intelligence. It 
is an n X n matrix where n is the number of classes, and whose entry (^, j) 
expresses how many patterns of class i were labeled as class j. It allows 
analyzing the error of the classification and which class was most wrongly 
classified. 

5. Experimental Results 

We have conducted two sets of experiments to evaluate the texture recog- 
nition performance. The aim of the first set of experiments is to analyze the 
power of the Tsallis entropy with only one q value and compare its perfor- 
mance against that of the BGS entropy. The second set of experiments is 
devised to analyze the Tsallis entropy with a multi-g approach, including 
analysis of different sets of g's. 

5.1. Classification Results: single q 

To fairly compare the BGS and Tsallis entropies, we conducted an ex- 
periment using a single value of g, that is, the feature vector reduced to a 
single number for the purpose of our study. Note, however, that in a prac- 
tical system one would use a higher-dimensional feature vector (i.e., more 
numbers) to represent an image sample. In the Brodatz dataset, using the 
BGS entropy alone yielded a classification rate of 25%(±5.78). To compare 
it to the Tsallis entropy, we need to choose an appropriate value of q. The 
classification rate of different values of q are presented in the plot of the Fig- 
ure g The best result of 32. 75% (±6. 17) was obtained by the Tsallis entropy 
with q = 0.2. Notice that any Tsallis entropy with q < 1 outperforms the 
BGS entropy. Moreover, note that in the most cases, the classification rate 
decreases as the value of q increases. 
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Figure 2: Classification rate using a single value of g at a time. 



We expected that for texture images in general (i.e., beyond Brodatz) 
the highest classification rates are also obtained for values of q close to 0.2. 
Table [T] presents the best values of q for different texture image datasets, in- 
cluding CUReT [H] (Columbia-Utrecht Refiectance and Texture Database), 
Outex from the University of Oulu [15], and VisTex from MIT. For all but 
the VisTex dataset the best value of q was 0.2. For VisTex the highest 
classification rate of 21.30%(±3.27) was obtained by g = 0.1 while q = 0.2 
achieved a correct classification rate of 20.72%(±3.21). Moreover, the Tsallis 
entropy using q = 0.2 outperforms the BGS entropy for all datasets. Note 
that these results concern the effectiveness of a single number to abstract 
the information of an image window. We strees that on a practical system 
one would likely represent a 200 x 200 image with a larger set of numbers 
(features) . 

5.2. Classification Results: multiple q 

One hypothesis for the power of the Tsallis entropy concerning pattern 
recognition is that each Sq could hold different information about the pattern. 
Therefore, different values of q used together could improve classification 
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Dataset 


best q 


Classification Rate 


BGS Entropy 


Brodatz 


0.2 


32.75%(±6.17) 


25.00%(±5.78) 


VisTex 


0.1 


21.30%(±3.27) 


16.90%(±3.26) 


CUReT 


0.2 


10.14%(±0.84) 


08.64%(±0.77) 


Outex 


0.2 


14.19%(±2.37) 


13.38%(±2.03) 



Table 1: Classification results for different texture image datasets and the best value of q. 
The Tsallis entropy outperforms the BGS entropy for all datasets. 

rates. Indeed, Figure [3] shows that the Sq curve can help in distinguishing 
the patterns - the feature vector Sq = {Sq ^, So,2---, S2) is plotted for three 
different textures, giving an idea of the discriminating power of the Tsallis 
entropy. 

Since the nature of the Sq curve is exponential, it is difficult to grasp 
the differences between the patterns. In order to improve the visualization 
of pattern behavior through the Sg curve, we calculated the mean vector fl, 
that is, the average Sg curve of the 400 samples of the image database, and 
plotted the difference of Sg and fl for 10 patterns picked at random. Figure ^ 
Notice that the first values of q present the best pattern discrimination, which 
agrees with the results of the experiments for a single q, where a q around 
0.2 performs best. 

To use Tsallis entropy curve as a pattern recognition tool, we composed 
a feature vector Sg in the interval q=0.1:0.1:2 {i.e., q from 0.1 to 2 in incre- 
ments of 0.1). Using this feature vector of 20 elements, a classification rate 
of 73. 75% (±6.49) was achieved. We also constructed feature vectors using 
different intervals of values of q. The classification results are shown in Ta- 
ble [2j These results corroborate the hypothesis that a multi-q approach can 
improve the power of the Tsallis entropy applied to pattern recognition. The 
multi-q strategy using only 20 elements results in a gain of 41% compared to 
best value of q and a gain of 48.75% compared to the BGS entropy. 

To visualize the behavior of the texture classes, we use a Karhunen-Loeve 
transform (or principal components analysis, PGA). This allows us to pro- 
jecting the feature vectors onto a lower- dimensional space which is easier to 
visualize, and where the variance is higher as possible. The PGA was applied 
to the feature vectors of the 400 samples (40 classes) of the Brodatz dataset 
and a scatter plot was obtained. Figure |5} As we can see, the classes become 
organized in distinguished cluster, illustrating the power of classification of 
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Figure 3: The feature vector (5*0.1, S2) for three different textures, giving an idea of the 
discriminating power of the Tsahis entropy. 

the multi-q approach. 

5.3. Feature selection: enhancing the discriminanting power of the Sq curve 

We have noticed that composing a feature vector with different SqS can 
boost the discriminative power of the TsaUis entropy. Nevertheless, the fea- 
ture vector was composed by q in the range 0.1, 0.2, ... 2 and, as remarked in 
Figure [2| there are q values in the interval that do not achieve the maximum 
classification. Therefore, a question arises: is the information of the entire 
interval < g < 2 aiding to distinguish the image patterns? 

Feature selection is a technique used in multivariate statistics and in 
pattern recognition that selects a subset of relevant features with the aim of 
improving the classification rate and also its robustness. There are several 
algorithms for feature selection, the reader can find a feature selection survey 
in [16j . We have used a very simple strategy in the present work to clarify the 
influence of the different q in the classification process. The main idea is use 
only the Sq values that presents significant contribution to the classification 
rate. Figure [6]^a) plots the classification rate using the first S'^'s, taking 
different quantities of these; e.g., for the first datapoint of the curve a single 
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Figure 4: Tsallis entropy versus q for different patterns. Each, color and symbol combina- 
tion represents a class of texture pattern. 
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Figure 5: To give a visual idea of the discriminating power of the TsalUs entropy, a 
principal components analysis (PCA) was performed on Brodatz's 40 classes considering a 
20-dimensional feature vector (S'o.i, ...,82), and visualized as a 2-dimensional scatter plot 
of the first two principal components. 
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Ransre of q 


#Features 


Classification Rate % 


0.2 


1 


32.75(±6.17) 


0.5:0.5:2 


5 


52.75(±6.27) 


0.2:0.2:2 


10 


67.00(±6.43) 


0.1:0.1:2 


20 


73.75(±6.21) 


0.05:0.05:2 


40 


75.75(±6.54) 


0.01:0.01:2 


200 


77.25(±6.49) 


0.005:0.005:2 


400 


77.50(±6.01) 


0.001:0.001:2 


2000 


78.50(±6.16) 



Table 2: Classification results for different sets of q. 



Sq was used, (5'o.i), in the second datapoint two S'^'s, (5'o.i, 5'o.2), and at 
position n on the x axis, n 5'g's. The curve shows there are values of Sg 
that improve the classification rate but there are also values of q that do not 
increase the classification rate or even decrease it. To make the contribution 
of each Sg easier to see, we take the derivative of the curve of the Figure [6]^a), 
shown in the Figure [6]^b). We performed feature selection by picking the g's 
whose values in the derivative curve are greater than t. 



III! ■ - - - ■ 

(a) " "~ ^ "(b) " " " ; " 

Figure 6: (a) Classification rate for feature vectors using the first 5', elements: for a value 
of (?, the curve gives the classification when using 5*^ = (5'o.i, . . . , Sq) as a feature vector. 
The number of elements in the feature vector increases for each element of x axis, (b) the 
derivative of the previous curve. 

The feature selection reduces the number of features and increases the 
classification power of the multiple q entropy approach. The Table [3] shows 
results of the feature selection over multi-g approach for different range of q 
at the interval to 2. As can be observed with just 4 elements the result is 
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equivalent to a feature vector with size 20 without the feature selection (see 
Table [2]) and with 27 elements, the feature selection approach overcome the 
performance of a 2000 size feature vector (Table [2]). The results demonstrates 
the feature selection algorithm presents a optimal performance of the mult-g 
approach. The main reason of the performance increase is that the algorithm 
can select the Sg significant elements. 

The confusion matrices for different representative approaches to entropy- 
based image classification investigated in this paper are shown in 3D Fig- 
ure ??. An arbitrary entry {x,y) of a confusion matrix expresses how many 
patterns of class x were labeled as class y, and this is visualized as height z in 
the figure. This visualizations gives insight into the error of the classification 
and which class was most wrongly classified. 



Range of q 


T^Features 


Classification Rate % 


0.5:0.5:2 


4 


52.75 


0.2:0.2:2 


4 


73.75 


0.1:0.1:2 


6 


80 


0.05:0.05:2 


7 


80.25 


0.01:0.01:2 


21 


81.5 


0.005:0.005:2 


26 


81.75 


0.001:0.001:2 


27 


82 



Table 3: Classification results for feature selection from different step sizes of q 



6. Conclusion 

In this paper we showed how the Tsallis entropy can be used in image 
pattern classification with great advantage over the classic entropy. The 
parametrized Tsallis entropy enables larger-dimensional feature vectors using 
different values of q, which yields vastly better performance than using BGS 
entropy alone. This points to the fact that the Tsallis entropy for different q 
does encode much more information from a given histogram than the BGS 
entropy. In fact, one of the results show that as little as 4 values of q, 
together, are enough to outperform the BGS entropy by about 3x. Work to 
further analyze the implications of these results within a deeper information- 
theoretic framework is underway, shedding light into the usefullness of the 
Tsallis entropy for general problems of pattern recognition. 
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Figure 7: Confusion matrices for (a) BGS entropy alone, (b) Tsallis entropy with q = 0.2, 
(c) multi-g approach using the feature vector Sq = (S'o.i, ■■■^82), and (d) feature vector 
constructed with the feature selection technique described in the paper. 
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